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SUMMARY 


This  technical  report  is  the  first  in  a  series  that  describes  the 
results  of  empirical  studies  of  the  efficacy  of  adaptive  automation  (or  adaptive 
functional  allocation)  on  the  performance  of  flight-relevant  tasks.  The  studies 
support  a  program  of  research  whose  goal  is  to  identify  and  develop  human 
performance-based  design  principles  for  the  application  of  adaptive  automation 
technology.  The  investigations  are  also  designed  to  evaluate  and  validate 
alternative  adaptive  automation  concepts. 

The  present  investigation  had  three  major  objectives:  (1)  Develop 
software  that  provides  a  robust  and  sensitive  set  of  flight-relevant  tasks.  (2) 
Provide  operator  performance  data  for  each  flight  task  under  normal  (manual), 
single-task  conditions.  (3)  Evaluate  the  sensitivity  of  operator  performance  on 
each  task  to  changes  in  task  difficulty  and  in  the  number  and  type  of  concurrent 
tasks  performed.  The  overall  goal  was  to  provide  an  empirical  "baseline"  from 
which  the  results  of  future  adaptive-automation  studies  (in  which  task  difficulty 
and  operator  workload  would  vary)  could  be  successfully  interpreted. 

Three  tv.3ks  were  carried  out  in  support  of  these  objectives. 

First,  extensive  software  changes  were  made  to  an  existing  multi¬ 
task  flight-simulation  package,  the  Multi -Attribute  Task  Battery  (MAT),  which 
includes  tracking,  monitoring,  fuel  management,  and  ATC  communications 
tasks  (Comstock  &  Arnegard,  1990).  The  revised  MAT  software  enabled 
independent  manipulation  of  parameters  of  each  flight  task  under  either 
manual  or  automated  performance  modes.  Successive  iterations  of  software 
development  and  informal  user  testing  led  to  a  version  that  is  suitable  for  the 
needs  of  the  experiments  to  be  carried  out  in  the  next  year  of  the  project. 

Second,  a  pilot  study  with  1 2  subjects  was  carried  out  to  evaluate 
the  sensitivity  of  the  revised  MAT  tracking  task  to  manipulations  of  task  difficulty 
and  practice,  using  either  a  joystick  or  a  mouse  as  the  control  device.  The 
results  established  an  appropriate  level  of  difficulty  (driving  function  frequency) 
for  the  tracking  task.  The  results  also  indicated  that  extensive  practice  was  not 


NAWCADWAR-92035-60 


iii 

required  to  reach  stable  performance  levels  on  this  task.  Satisfactory 
performance  data  were  obtained  with  either  control  device,  but  the  joystick  was 
chosen  for  subsequent  studies  because  of  operator  preference  and  its  greater 
similarity  to  cockpit  control  devices. 

Third,  an  experiment  with  8  subjects  was  carried  out  to  examine 
the  effects  of  task  combination  {single-,  dual-,  and  multi-task)  on  performance  of 
the  tracking,  monitoring,  and  fuel  management  tasks.  Performance  on  each 
task  decreased  systematically  from  single-task  to  dual-task  and  from  dual-task 
to  multi-task  combinations.  However,  the  tracking  and  monitoring  tasks  were 
the  most  sensitive  to  task  combination;  performance  on  the  fuel-management 
task  was  less  sensitive,  probably  due  to  high  inter-subject  variability.  The 
performance  profiles  obtained  were  consistent  with  operator  limitations  in 
perceptual/cognitive  processing  resources  or  in  structural  (input/output)  factors,. 
However,  it  was  argued  that  resource  scarcity  was  the  major  source  of 
performance  decrement.  Taken  together  with  the  data  from  the  pilot  study,  the 
results  established  the  sensitivity  of  the  tracking,  monitoring,  and  fuel 
management  tasks  of  the  revised  MAT  battery  to  variations  in  task  difficulty  and 
task  load. 


Overall,  the  three  studies  were  successful  in  meeting  the  first 
major  goal  of  the  adaptive-automation  research  program;  to  establish  a 
baseline  of  empirical  performance  data  in  a  multi-task  flight-simulation 
environment.  These  results  will  help  in  the  design  and  interpretation  of  results 
of  future  adaptive-automation  studies  that  will  be  carried  out  as  part  of  this 
research  program. 
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INTRODUCTION 

Background:  What  is  Adaptive  Automation? 

Recent  technological  advances  have  made  viable  the 
implementation  of  intelligent  automation  in  advanced  tactical  aircraft.  The  use 
of  this  technology  has  given  rise  to  a  number  of  new  human  factors  issues  and 
concerns  (NASA,  1989;  Wiener.1988).  Errors  in  highly  automated  aircraft 
have  been  linked  to  the  adverse  effects  of  automation  on  the  pilot's  system 
awareness,  monitoring  workload,  and  ability  to  revert  to  manual  control 
(Chambers  &  Nagel,  1985;  Hart  &  Sheridan,  1984;  Parasuraman,  1987;  Wiener, 
1988).  These  problems  have  been  attributed  to  technology-centered 
automation  design,  in  which  engineering  advances  largely  determine  whether 
and  how  automation  is  introduced  into  the  cockpit,  as  opposed  to  human- 
centered  design,  which  also  takes  into  account  pilot  capabilities  and 
limitations  in  using  automation  (NASA,  1989). 

Partly  in  response  to  these  concerns,  adaptive  automation1  or 
automation  that  is  implemented  dynamically  in  response  to  changing  task 
demands  on  the  pilot,  has  been  proposed  (Rouse,  1988).  Adaptive  automation 
may  be  superior  to  nonadaptive  or  "traditional"  automation  because  it  is 
thought  to  improve  pilot  situational  awareness,  increase  task  involvement, 
regulate  workload,  enhance  vigilance,  and  maintain  manual  skill  levels 
(Hancock,  Chignell,  &  Lowenthal,  1985;  NASA,  1989;  Noah  &  Halpin,  1986; 
Parasuraman,  1987;  Parasuraman  &  Bowers,  1987;  Rouse,  1976,  1988; 
Wickens  &  Kramer,  1985).  At  present,  however,  empirical  evidence  for  the 
efficacy  of  adaptive  automation  is  lacking.  In  a  review  of  the  literature  on 
adaptive  automation,  Parasuraman,  Bahri,  Deaton,  Morrison,  &  Barnes  (1990) 
found  few  laboratory  or  field  studies  of  the  effects  of  adaptive  automation  on 
pilot  performance.  If  adaptive  automation  is  to  be  a  viable  cockpit  design  option, 
more  needs  to  be  learned  about  its  effects  on  performance  under  different  flight 
conditions. 


1  Also  referred  to  as  "adaptive  aiding"  and  "adaptive  function  allocation" 
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The  report  by  Parasuraman  et  al.  (1990)  provides  an  extensive 
discussion  of  various  aspects  of  the  application  of  adaptive  automation  to  flight 
operations.  The  reader  may  consult  this  report  for  details  concerning  such 
issues  as  the  various  types  of  adaptive  automation,  the  logic  used  by  the 
adaptive  system  to  implement  task  changes,  the  question  of  pilot  consent  to 
suggested  adaptive  changes,  and  so  on.  For  example,  adaptive  automation 
may  include  allocation,  transformation,  or  partitioning  of  piloting  tasks  (Rouse, 
1988).  The  adaptive  logic  may  use  a  number  of  different  procedures  to  initiate 
task  changes,  for  example  mission  requirements,  the  designer’s  model  of  the 
pilot's  behavior  (including  pilot  intentions),  or  the  actual  measurement  of  pilot 
behavior  (including  physiology),  whether  off-line  or  on-line  (Parasuraman  et  al., 
1990).  The  task  changes  identified  by  the  adaptive  system  may  require  the 
pilot’s  consent,  as  is  conceived  in  Lockheed’s  Pilot’s  Associate  project,  or  the 
adaptive  system  may  initiate  the  changes  autonomously  after  informing  the 
pilot.  For  the  purposes  of  the  present  series  of  studies  we  assume  a  relatively 
simple  adaptive  system  in  which  the  system  allocates  tasks  to  either  the 
operator  or  the  computer.  No  special  form  of  adaptive  logic  is  assumed  and 
operator  consent  is  not  sought  (although  the  adaptive  task  changes  are  not 
implemented  without  informing  the  operator).  These  limitations  are  necessary 
as  a  starting  point  because  the  studies  we  are  conducting  are  the  initial 
empirical  studies  investigating  the  effects  of  adaptive  automation  on  operator 
performance.  In  subsequent  studies  we  will  examine  more  complex  modes  of 
adaptive  automation  (e.g.,  involving  performance-based  adaptation  and 
operator  consent). 


Program  of  Research 

The  aim  of  the  present  program  of  research  is  to  investigate 
issues  related  to  the  efficacy  of  adaptive  automation  in  a  series  of  experiments 
examining  performance  on  several  simulated  flight-related  tasks.  Any 
adaptive  automation  scheme,  irrespective  of  the  adaptive  logic  used  or  the  task 
changes  implemented,  involves  transitions  or  changes  from  one  level  of 
automation  of  a  task  to  another.  Our  overall  goal  is  to  understand  the  impact  on 
performance  of  both  the  dynamics  of  such  transitions  as  well  as  the  static 
demands  associated  with  each  level  of  automation  in  isolation.  For  example,  a 
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particular  flight  function  may  be  automated  for  long  periods  of  time,  then  be 
carried  out  manually  for  a  short  period,  and  then  revert  for  another  long  period 
to  automated  control.  We  refer  to  this  as  long-cycle  adaptive  automation.  At 
the  other  extreme,  in  short-cycle  adaptive  automation,  a  given  flight  function 
may  be  cycled  between  manual  and  automated  control  quite  frequently, 
particularly  if  the  adaptive  logic  is  very  sensitive  to  small  changes  in  task 
demands  or  pilot  workload. 

Our  studies  will  examine  the  effects  of  both  short-cycle  and  long- 
cycle  adaptive  automation  on  performance  in  a  multi-task  environment.  We 
plan  to  use  a  cost-benefit  approach  to  studying  the  efficacy  of  adaptive 
automation.  Our  experiments  will  be  designed  to  obtain  empirical  evidence  for 
the  claimed  beneficial  effect  of  adaptive  automation  on  performance  as  well  as 
to  document  the  existence  of  possible  costs. 


Present  Studies 

Our  initial  efforts  are  aimed  at  investigating  the  effects  of  both 
short-term  and  long-term  shifts  in  adaptive  automation  (i.e.,  task  allocation)  on 
the  performance  of  flight-relevant  tasks  that  tap  three  broad  information¬ 
processing  domains:  perceptual-cognitive  (system  monitoring),  cognitive- 
strategic  (fuel  management),  and  perceptual-motor  (tracking).  As  discussed  in 
the  Parasuraman  et  al.  (1990)  review,  a  few  studies  have  examined  the  effects 
of  automation  on  human  performance,  but  these  studies  have  mostly  used 
static  automation,  i.e.  where  the  set  of  tasks  that  are  automated  and  manual 
remains  fixed  (Fuld,  Liu,  &  Wickens,  1987;  Idaszak  &  Hulin,  1989;  Kibbe  & 
Wilson,  1989;  Wickens  &  Kessel,  1981).  Studies  have  not  been  conducted  in 
which  operators  are  cycled  through  phases  of  manual  and  automated 
performance  of  a  task,  which  is  a  key  aspect  of  adaptive  automation.  In  order  to 
determine  whether  adaptive  automation  has  positive  effects  on  performance 
as  claimed,  automation  effects  need  to  be  examined  in  conditions  where 
operators  are  shifted  between  manual  and  automated  conditions  rather  than 
always  perform  in  a  manual  or  an  automated  mode. 
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In  the  present  report  we  describe  the  development  of  our  multi¬ 
task  capability  and  present  an  empirical  evaluation  of  its  performance 
characteristics.  This  will  provide  the  basis  for  investigating  effects  of  automation 
shifts  on  performance,  which  is  described  in  the  second  report  in  this  series. 
The  main  goals  of  the  present  study  weie: 

¥  To  develop  a  robust  and  sensitive  set  of  flight-relevant 

tasks.  Major  sub-goals  for  this  goal  include:  (1 )  the  ability  to 
sample  performance  data  continuously;  and  (2)  change 
task  parameters  flexibly  (i.e.  those  required  for  a  broad 
range  of  adaptive-automation  studies)  without  having  to 
make  major  software  changes. 

¥  Provide  "baseline"  performance  data  for  each  task  in 
isolation  that  will  be  useful  in  interpreting  performance 
changes  in  future  adaptive-automation  studies  using  the 
same  tasks. 

¥  Evaluate  the  sensitivity  of  task  performance  to  changes 
in  task  difficulty  and  in  the  number  and  type  of  other 
concurrent  tasks. 


The  present  report  provides  a  description  of  three  main  tasks  that 
were  carried  out  in  the  initial  phase  of  our  research  program.  First,  an  existing 
multi-task  flight-simulation  package,  the  Multi-Attribute  Task  Battery  (MAT) 
(Comstock  &  Arnegard,  1990),  was  extensively  revised.  The  MAT  software, 
which  includes  tracking,  monitoring,  fuel  management,  and  ATC 
communications  tasks,  was  revised  to  support  our  empirical  studies  on 
adaptive  automation.  Next,  the  results  of  a  pilot  study  are  presented.  The  aim 
of  the  pilot  study  was  to  obtain  measures  of  performance  of  one  of  the  tasks  of 
the  revised  MAT,  tracking,  under  different  levels  of  task  difficulty  and  as  a 
function  of  control  input  (joystick  versus  mouse)  and  practice.  Finally,  the 
results  of  an  experiment  investigating  single-  and  multiple-task  performance 
characteristics  of  the  MAT  battery  are  reported. 
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SOFTWARE  DEVELOPMENT:  EXTENSION  OF  THE  MAT  BATTERY 

In  order  to  begin  our  investigations  of  adaptive  automation  and 
human  performance,  we  needed  to  develop  a  benchmark  set  of  tasks  to  be 
used  in  simulations  of  flight  operations.  This  set  of  tasks  would  then  serve  as 
the  primary  research  vehicle  for  investigating  a  variety  of  issues  arising  from 
the  use  of  adaptive  systems  in  the  cockpit.  In  developing  this  capability,  we 
used  the  following  criteria:  (1)  the  tasks  used  should  be  analogous  to  some  of 
the  activities  that  crewmembers  perform  during  flight;  (2)  at  the  same  time,  the 
tasks  used  should  be  directly  related  to  or  have  analogs  to  those  studied  in  the 
research  literature  on  cognitive  psychology  and  human  performance;  (3)  a 
balance  should  be  struck  between  the  need  for  fidelity  to  real  aircraft 
operations  and  the  need  for  experimental  control  over  independent,  dependent, 
and  extraneous  variables,  with  a  bias  towards  experimental  control,  at  least  in 
.he  initial  stages  of  the  research;  (4)  the  hardware  and  software  requirements 
for  implementing  the  tasks  should  be  modest,  i.e.,  at  the  inexpensive,  low-end 
personal-computer  level  rather  than  expensive  graphics  workstations  or 
minicomputers;  (5)  performance  data  should  be  available  continuously  at 
experimenter-defined  sampling  rates  for  each  piloting  task;  and  (6)  both  non¬ 
pilots  and  pilots  should  be  able  to  perform  the  tasks. 

On  the  basis  of  these  criteria,  we  chose  to  use  the  Multi-Attribute 
Task  (MAT)  battery,  developed  by  Comstock  and  Arnegard  (1990),  as  a  starting 
point  for  the  development  of  our  task  battery.  To  meet  a  number  of  other 
requirements  of  the  adaptive  automation  research  program,  the  MAT  software 
was  extensively  revised,  as  described  below. 

The  MAT  battery  consists  of  four  main  tasks  that  are  presented  in 
different  windows  on  the  monitc'  of  an  AT-class  personal  computer:  tracking, 
system  monitoring,  fuel  (or  resource)  management,  and  ATC  communications. 
Also  present  are  a  "scheduling"  window  showing  the  beginning  and  duration  of 
the  tracking  and  communications  tasks  and  a  "pump  status"  window  showing 
the  flow  rates  of  the  pumps  of  the  fuel  management  task.  All  windows  are 
dynamically  updated  and  independent  responses  are  required  for  each  task. 
Figure  1  shows  a  typical  display  of  MAT  while  in  operation. 


SVSTEM  MONITORING  I  TRACKING 
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The  four  tasks  of  the  MAT  meet  the  above-mentioned  criteria  to 
various  degrees.  For  example,  with  respect  to  criteria  1  and  2,  the  tracking 
(e.g.,  Wickens,  1986)  and  monitoring  tasks  (e.g.,  Parasuraman,  1986;  Wiener, 
1984)  have  analogs  in  both  the  aircraft  cockpit  and  the  human  performance 
research  literature.  However,  while  fuel  management  and  ATC  communications 
are  clearly  tasks  that  every  pilot  performs  in  the  cockpit,  these  tasks,  or  analogs 
of  them,  have  not  been  systematically  studied  in  the  laboratory  by  experimental 
psychologists.  Nevertheless,  the  ability  to  exercise  good  experimental  control 
and  obtain  continuous  performance  data  for  these  tasks,  including  the 
performance  of  nonpilots  (criteria  4,  5,  and  6),  indicates  that  the  lack  of  an 
existing  body  of  empirical  data  and  theory  for  these  tasks  is  not  a  major 
drawback.  Any  such  limitation  is  balanced  by  the  greater  realism  for  aircraft 
operations  that  the  inclusion  of  these  tasks  provides. 

In  order  to  meet  the  special  needs  of  our  adaptive  automation 
research  program,  the  MAT  software  was  extensively  revised  to  allow  the 
following  features: 

¥  Independent  script-driven  presentation  of  each  of  lhe  four 
tasks.  Parameter  changes  for  any  task  do  not  require 
reprogramming  or  extensive  menu  selection  but  simple 
editing  of  the  relevant  task  script. 

¥  Menu-selectable  variation  in  the  presentation  and  relative 
positioning  of  the  task  windows  (e.g.,  deleting  a  window  or 
reversing  the  position  of  two  windows).  This  feature 
allows  the  investigator  to  examine  the  effects  on 
performance  of  task  layout  and  other  aspects  of  the  user 
interface. 

¥  Script-driven  automation  of  anv  task  or  combination  of 
tasks.  This  feature  was  available  only  for  tracking  in  the 
original  version  of  MAT  but  is  available  for  all  tasks  except 
ATC  communications  in  the  revised  version. 
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¥  Menu-selectable  or  script-driven  variation  in  the  "efficiency" 
and  "reliability"  of  automation.  This  feature  is  unique  to  the 
revised  MAT.  It  can  be  implemented  in  two  versions:  (a) 
automation  that  is  1 00%  reliable,  but  has  slight  "deviations” 
or  "imperfections":  i.e.  the  automation  performs  the  task 
without  error  but  exhibits  slight  fluctuations  from  normal 
(e.g.,  an  occasional  drift  of  the  target  in  the  tracking  task  to 
one  point  of  the  display  without  corrective  action  being 
applied  immediately);  (b)  automation  that  is  less  than  100% 
reliable.  Feature  (a)  is  available  for  all  tasks  and  feature  (b) 
for  the  monitoring  and  fuel  management  tasks. 

¥  Variable  performance  sampling  rates.  The  rate  of  sampling 
operator  performance  of  each  task  is  variable,  from  a  low 
of  every  10  min  to  a  high  of  every  0.1  sec. 

Extensive  software  development  of  MAT  was  carried  out  in  the 
initial  months  of  the  project.  An  iterative  software  design  procedure  was  used. 
Stimulus  and  task  parameters,  response  modes,  and  other  aspects  of  the  user 
interface  were  systematically  varied.  User  tests  were  then  carried  out  with  lab 
personnel.  These  informal  user  performance  tests  were  used  as  a  basis  to 
implement  some  of  the  display/interface  changes,  reject  some  of  the  software 
changes,  and  revise  others.  It  is  anticipated  that  successive  iterations  will  be 
required  as  the  requirements  for  future  studies  change.  The  current  version  of 
the  revised  MAT  is  designed  to  meet  the  needs  of  the  experiments  to  be  carried 
out  in  the  next  year  of  the  project. 

A  key  feature  of  the  revised  MAT  battery  is  that  the  component 
tasks  are  dynamic.  Many  laboratory  tasks  used  in  human  performance  tasks 
appear  static;  displays  are  updated  intermittently  and  events  are  presented 
discretely.  In  contrast,  the  MAT  task  displays  are  updated  continuously  and 
operator  responses  are  required  intermittently.  This  feature  gives  the  MAT 
displays  very  much  the  "feel"  of  real-world  displays  found  in  the  aircraft  cockpit 
or  the  power  plant  control  room,  although  the  display  graphics  and  symbology 
are  representative  of  but  not  exact  replicas  of  any  real-world  system.  This 
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approximation  to  real  displays  is  combined  with  the  ability  to  exercise  close 
control  over  task  parameters  (such  as  the  timing  and  frequency  of  events,  the 
positioning  of  task  information,  sampling  rate  of  operator  responses,  etc),  a 
feature  that  is  normally  only  characteristic  of  artificial  laboratory  tasks. 

Descriptions  of  the  four  main  MAT  tasks  are  presented  below. 
Note  that  the  tasks  in  the  revised/extended  MAT  differ  from  those  described  in 
the  original  report  by  Comstock  and  Arnegard  (1990),  particularly  for  tasks  in 
the  automated  mode. 

Tracking  Task 

Manual  Mode.  A  first-order,  two-dimensional  compensatory 
tracking  task  with  joystick  control  is  presented  *n  one  window  of  the  MAT 
display  (see  Figure  1).  Dashed  x-  and  y-  axes  are  provided  for  reference. 
Within  the  window  is  a  smaller  dashed  rectangle  drawn  around  the  center  point 
of  the  window.  A  green  circular  target  symbol,  representing  the  deviation  of  the 
aircraft  from  its  course,  fluctuates  within  the  window  in  the  x-  and  y-  directions 
according  to  a  specified  forcing  function  consisting  of  a  sum  of  nonharmonic 
sine  waves.  The  highest  (cut-off)  frequency  of  the  forcing  function  can  be 
varied;  typically  0.05  -  0.1  Hz  cut-off  frequencies  are  used  in  our  studies. 
Control  inputs  are  provide  by  a  displacement  joystick.  The  control  dynamics  are 
first-order,  or  velocity  control.  If  no  control  input  is  applied,  the  aircraft  symbol 
drifts  away  from  the  center  towards  the  edges  of  the  window.  The  subject’s  task 
is  to  keep  the  aircraft  within  the  central  rectangle  by  applying  the  appropriate 
control  inputs  in  the  x-  and  y-  directions.  For  example  if  the  aircraft  is  to  the  right 
of  center,  a  leftward  joystick  movement  will  cause  the  circle  to  return  to  the 
center.  Subjects  are  given  training  in  first-order  control  by  demonstrations  of  the 
effects  of  small  and  large  control  inputs  (in  either  the  x-  or  y-  directions)  on  the 
speed  of  movement  of  the  aircraft. 

Automated  Mode.  Under  automation  control,  the  joystick  is 
disabled  and  the  aircraft  movements  are  compensated  for  by  software. 
However,  small  fluctuations  around  the  center  of  the  window  remain,  to  simulate 
random  perturbations  in  the  automatic  control.  Under  normal  automated 
conditions,  therefore,  the  aircraft  appears  to  be  anchored  at  the  center  of  the 
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window,  but  with  very  small  movements  about  the  center  that  give  the 
appearance  of  a  dynamic  rather  than  completely  static  display. 

Another  automation  control  option  is  the  appearance  of 
occasional  "deviations"  in  the  automatic  control.  Under  these  conditions,  the 
aircraft  begins  to  drift  from  the  center  until  it  reaches  the  inner  rectangle  and 
then  drifts  back.  The  deviation  can  be  programmed  to  occur  at  specified 
random  intervals  (or  not  at  all).  This  option  is  provided  so  that  the  experimenter 
can  simulate  the  workload  associated  with  operator  "supervisory  control"  of  the 
automation  (e.g.,  Parasuraman  et  al.,  1990;  Wiener,  1988).  The  degree  to  which 
the  operator  "supervises"  the  automation  can  then  be  roughly  estimated  by 
querying  the  operator  after  the  task  is  completed  as  to  the  number  of  times  such 
deviations  occurred.  To  discourage  the  operator  from  continuous,  active  task 
processing  aimed  at  detecting  such  deviations,  the  deviations  should  be 
presented  in  the  form  of  "catch  trials,"  i.e.,  the  operator  should  be  told  that 
deviations  might  occur  but  they  should  not  be  presented  in  every  block  2. 


Performance  Measures.  Operator  performance  of  the  tracking  task 
is  evaluated  by  sampling  the  x  and  y  control  inputs  at  10  Hz  and  thus  deriving 
the  x  and  y  deviations.  The  root  mean  square  (RMS)  error  is  then  computed  for 
the  samples  obtained  over  a  1-sec  period.  In  computing  the  combined 
horizontal  and  vertical  deviations  from  the  target,  vertical  deviations  are 
converted  (in  proportion  to  the  monitor  x  and  y  resolution)  to  horizontal  pixel 
units  before  combination  with  the  horizontal  deviations; 

N 

A 

RMS  error  =  [•  {V  +  (»Cyj)2yN] 

i 

where  ~x  and  ~y  are  the  x  and  y  deviations,  K  is  the  monitor  resolution  ratio 
(horizontai/vertical),  and  N  is  sample  size. 


2  We  thank  Jonathan  Gluckman  of  NADC  for  a  suggestion  that  led  to  the  development  of 
this  automation  option. 
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RMS  error  scores  for  successive  1*sec  epochs  can  also  be  averaged  over  a 
longer  time  period  of  performance  (e.g.,  5  or  10  min)  to  yield  a  mean  RMS 
error  score  for  a  block. 

System  Monitoring  Task 

Manual  Mode.  The  upper  left  window  in  Figure  1  presents  the 
system  monitoring  task.  Two  monitoring  sub-tasks  are  available,  and  either  or 
both  tasks  can  be  chosen:  warning  light  monitoring  and  probability  monitoring. 
The  warning  monitoring  sub-task  consists  of  two  boxes  in  the  upper  half  of  the 
window,  one  green  and  one  red.  The  light  on  the  left  is  normally  on,  as 
indicated  by  a  lighted  green  area.  The  subject  is  required  to  detect  the 
absence  of  this  light  by  pressing  the  "OK"  key  on  the  keyboard  when  the  light 
goes  out.  The  light  on  the  right  is  normally  off.  When  the  red  light  comes  on, 
the  subject’s  task  is  to  respond  by  pressing  the  "WARNING"  key  when  he  or  she 
detects  the  presence  of  that  red  light.  If  the  subject  does  not  detect  either 
abnormality,  the  situation  reverts  back  to  normal  status  after  a  preprogrammed 
timeout  period  (e.g.,  15  seconds). 

The  probability  monitoring  sub-task  consist  of  four  vertical  scales 
with  moving  pointers.  The  scales  are  marked  as  indicating  the  temperature  (T1, 
T2)  and  pressure  (PI ,  P2)  of  the  two  aircraft  engines.  In  the  normal  condition, 
the  pointers  fluctuate  around  the  center  of  the  scale  within  one  limit  in  each 
direction  from  center.  Independently  and  at  intervals  according  to  the  script, 
each  display’s  pointer  shifts  its  "center"  position  away  from  the  middle  of  the 
verticle  display.  The  subject  is  responsible  for  detecting  this  shift,  regardless  of 
direction,  and  responding  by  pressing  the  corresponding  function  key  (T1,  T2, 
PI,  or  P2).  The  appropriate  response  key  is  identified  below  each  vertical 
display. 


Feedback  is  provided  when  the  out-of-range  status  of  a  scale  is 
correctly  identified  by  the  subject .  The  pointers  of  the  dial  to  which  the  subject 
responded  moves  immediately  back  to  the  center  points  and  remains  there 
without  fluctuating  for  a  period  of  1.5  seconds.  If  the  subject  fails  to  detect  an 
abnormality  in  the  probability  mcnitoring  task,  the  fault  is  automatically 
corrected  10  seconds  from  the  beginning  of  its  occurrence. 
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Automated  Mode.  Under  automation  control,  the  keyboard  keys 
T1,  T2,  PI,  and  P2  are  disabled  and  the  scripted  engine  malfunctions  are 
identified  and  responded  to  by  software.  To  enable  the  operator  to  know  that 
the  automation  has  properly  detected  and  corrected  the  malfunction,  an 
automation  "reaction  time"  of  4  sec  is  built  in.  Another  automation  control  option 
are  occasional  "deviations"  in  the  efficiency  of  control,  as  for  the  automated 
tracking  task.  Under  this  option,  the  automation  correctly  identifies  and  corrects 
the  malfunction,  but  has  a  delayed  "reaction  time"  of  10  sec.  The  deviations 
can  be  scripted  to  occur  at  random  time  intervals.  A  second  automation  control 
option  concerns  the  reliability  of  the  automation.  "Automation  failures"  can  be 
scripted  to  occur  at  random  time  intervals.  When  such  a  failure  occurs,  one  of 
the  scale  pointers  goes  out  of  range.  However,  the  engine  malfunction  is  not 
detected  and  corrected  by  the  automation  within  the  4  sec  period.  Overall 
automation  reliability  is  computed  as  the  percentage  of  malfunctions  that  are 
correctly  identified  by  the  automation. 

Performance  Measures.  Operator  performance  for  the  two 
monitoring  tasks  is  evaluated  by  recording  all  key  presses  made  with  the  six 
response  keys  for  the  monitoring  task.  The  reaction  time  associated  with  a 
correct  response  (i.e.,  to  an  engine  malfunction  event)  is  also  computed  to 
within  a  resolution  of  0.1  sec.  The  percentage  of  correct  detection  responses 
(or  hit  rate),  the  percentage  of  false  responses  when  no  malfunction  occurs  (or 
false  alarm  rate),  and  the  mean  reaction  time  for  a  detection  response  can  be 
computed  from  these  data.  Incorrect  detection  responses,  i.e.,  when  the 
operator  detects  a  malfunction  but  presses  the  wrong  key  (e.g.,  presses  T1  for  a 
malfunction  in  the  temperature  of  engine  2)  are  also  recorded,  although  these 
tend  to  be  rare.  Hit  and  false  alarm  rates  and  mean  reaction  time  are  computed 
for  a  specified  period  (e.g.,  5  or  10  min)  within  a  block. 

Fuel  (Resource)  Management  Task 

Manual  Mode.  This  task  is  meant  to  simulate  the  actions  need  to 
manage  the  fuel  system  of  the  aircraft.  Figure  1  displays  the  fuel  (resource) 
management  window  The  six  rectangular  regions  are  tanks  which  hold  fuel. 
Levels  marked  in  green  within  the  tanks  represent  the  amount  of  the  fuel  in 
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each  tank,  and  these  levels  increase  and  decrease  as  the  amount  of  fuel  in  a 
tank  changes. 

Pumps  connect  the  tanks  so  that  fuel  can  be  transferred  from  one 
tank  to  another  in  the  direction  indicated  by  the  corresponding  arrow  and  fuel 
line.  The  numbers  underneath  four  of  the  tanks  (Tanks  A,  B,  C,  and  D) 
represent  the  amount  of  fuel  in  gallons  for  each  of  the  tanks.  This  number  is 
updated  every  2  sec  as  the  amount  of  the  fuel  in  the  tanks  increases  or 
decreases.  The  maximum  capacity  for  either  Tank  A  or  B  is  4000  gallons. 
Tanks  C  and  D  can  contain  a  maximum  of  2000  gallons  each.  The  remaining 
two  supply  tanks  have  an  unlimited  capacity. 

Subjects  are  instructed  to  maintain  the  level  of  fuel  in  both  Tanks  A 
and  B  at  2500  gallons  each.  This  critical  level  is  indicated  graphically  by  a  tick 
mark  in  the  shaded  bar  on  the  side  of  these  two  tanks.  The  numbers  under 
each  of  these  tanks  provide  another  means  of  feedback  for  the  subject.  The 
shaded  region  surrounding  the  tick  mark  represents  acceptable  performance. 
Tanks  A  and  B  are  depleted  of  fuel  at  the  rate  of  800  gallons  per  minute. 
Therefore,  in  order  to  maintain  the  task  objective,  subjects  must  transfer  fuel 
from  the  lower  supply  tanks. 

The  process  of  transferring  fuel  is  accomplished  by  activating  the 
pumps.  Each  pump  can  only  transfer  fuel  in  one  direction,  as  indicated  by  the 
corresponding  arrow.  These  pumps  are  turned  on  when  the  corresponding 
number  key  is  pressed  by  the  subject.  Pressing  the  key  a  second  time  turns  that 
particular  pump  off  and  so  on.  The  pump  status  is  indicated  by  the  color  of  the 
square  area  on  each  pump.  When  that  area  is  black,  or  lacking  in  color,  the 
pump  is  off.  A  green  light  in  this  area  indicates  that  the  pump  is  actively 
transferring  fuel. 

The  flow  rates  for  each  pump  are  presented  in  the  "Pump  Status" 
window.  The  first  column  of  numbers  represents  the  pump  number,  one 
through  eight.  When  a  pump  is  activated,  its  flow  rate  is  presented  next  to  the 
pump  number  in  this  window.  When  a  pump  is  off,  Its  flow  rate  is  zero.  Pump  1 
and  3  transfer  fuel  at  the  rate  of  800  gallons  per  min.  Pumps  2,  4,  5,  and  6 
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transfer  fuel  at  the  rate  of  600  gallons  per  min  and  Pumps  7  and  8  at  400 
gallons  per  min  . 

During  some  sections  of  the  simulation,  pump  faults  occur.  This  is 
indicated  by  the  appearance  of  a  red  light  in  the  square  on  the  pump.  When 
this  occurs,  the  pump  which  is  in  the  fault  mode  is  inactive.  Fuel  cannot  be 
transferred  through  that  pump  until  the  fault  is  corrected.  The  operator  has  no 
control  over  the  fault  correction;  the  duration  of  the  fault  is  written  into  the  script 
that  directs  the  program.  When  the  fault  is  corrected,  the  status  of  that  pump  is 
automatically  returned  to  the  "off"  mode,  regardless  of  its  status  before  the  fault 
condition. 


Likewise,  when  a  tank  becomes  full  to  capacity,  all  incoming 
pumps  are  automatically  turned  "off".  For  example,  if  all  of  the  pumps  were 
activated  and  Tank  A  reached  its  capacity  of  4000  gallons,  Pumps  1,  2,  and  8 
would  automatically  turn  "off".  Furthermore,  if  a  tank  were  to  become  totally 
depleted  of  fuel,  all  outgoing  pumps  would  be  deactivated. 

At  the  onset  of  each  flight  simulation,  Tanks  A  and  B  contain 
approximately  2500  gallons  of  fuel  each  and  Tanks  C  and  D  contain 
approximately  1000  gallons  of  fuel  each.  All  pumps  are  off  at  the  beginning  of 
the  task,  leaving  all  strategic  action  to  the  operator’s  discretion. 

Automated  Mode.  Under  automation  control,  the  keys  for 
activating  pumps  1  through  8  are  disabled.  All  pump  activations  are  executed 
from  a  script  that  mimics  expert  performance  3,  combined  with  the  following: 
(1)  all  fuel  level  changes  are  responded  to;  (2)  appropriate  pump  activations  are 
executed;  (2)  no  "extra"  pumps  are  activated  (e.g.,  activating  pump  2  when  that 
has  no  direct  effect  on  fuel  level;  this  sometimes  occurs  during  manual 
performance).  In  addition,  two  different  kinds  of  pump  fault  are  executed  from 
the  script.  The  first  pump  failure  lasts  60  sec  and  is  similar  to  pump  faults  in  the 
manual  mode.  The  second  kind  of  pump  failure  lasts  90  sec.  Under  normal 
automation  conditions,  only  the  first  kind  of  pump  failure  is  used.  When  the 


3  Defined  as  the  performance  of  two  laboratory  personnel  who  had  over  30  hours  of 
experience  of  manual  performance  on  the  fuel  management  task. 
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experimenter  wishes  to  evaluate  the  level  of  operator  supervision  of  the 
automation,  a  few  additional  pump  failures  of  the  second  type  are  included  in 
certain  "catch  blocks".  Thus,  as  is  done  with  the  tracking  and  monitoring  tasks, 
occasional  deviations  are  built  into  the  script.  The  operator  is  told  that  there  may 
be  occasional  deviations  in  the  time  taken  to  detect  and  fix  pump  failures  by  the 
automation  (90  sec  versus  60  sec)  and  that  they  may  be  queried  subsequently 
about  the  occurrence  of  such  deviations. 

Performance  Measures.  Operator  performance  on  the  fuel 
management  task  can  be  evaluated  in  a  number  of  ways.  Detailed  records  of 
the  key  presses  that  the  operator  makes  are  kept  so  that  the  particular  strategy 
that  the  operator  uses  (if  any)  to  meet  task  objectives  can  be  ascertained.  A 
global  measure  of  task  performance  can  also  be  obtained  by  computing  li'.e 
mean  RMS  error  in  the  fuel  levels  of  Tanks  A  and  B  (deviation  from  the  required 
level  of  2500  gallons).  Fuel  levels  are  sampled  and  RMS  error  computed  over 
a  30-sec  period.  RMS  error  scores  for  successive  periods  can  also  be  averaged 
over  a  longer  time  period  of  performance  (e.g.,  5  or  10  min)  to  yield  a  mean 
RMS  error  score  for  a  block.  A  second  global  measure  of  fuel  management 
performance  is  the  number  of  pump  activations  per  block,  although  this 
measure  can  only  be  meaningfully  interpreted  by  comparing  the  operator’s 
strategy  to  some  optimum  strategy  for  performing  the  task. 

ATC  Communications  Task 

Manual  Mode.  The  communications  task  is  presented  in  the  lower 
left-hand  window  of  the  MAT  display  (see  Figure  1).  The  task  consists  of  a 
series  of  audio  messages  which  are  presented  to  the  operator  through 
headphones.  These  messages  begin  with  a  six-digit  call  sign,  repeated  once, 
and  a  command  to  change  the  frequency  of  one  of  the  channels  listed  on  the 
screen.  The  operator  must  discriminate  his  or  her  call  sign,  "NGT504",  from 
other  three-letter,  three-number  combinations.  The  subject’s  call  sign  is  always 
displayed  at  the  top  of  the  communication  window.  Subjects  are  required  to 
change  navigation  and  communication  frequencies  by  the  use  of  the  arrow 
keys.  The  up  and  down  arrow  keys  are  used  to  select  the  appropriate 
navigation  or  communication  radio  and  the  left/right  arrow  keys  increase  or 
decrease  the  selected  radio  frequency  in  increments  of  0.2  Mhz. 
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Automated  Mode.  This  mode  is  not  yet  available. 

Performance  Measures.  Operator  performance  on  the 
communications  task  is  evaluated  by  computing  the  mean  detection  rate,  false 
alarm  rate,  and  reaction  time  in  responding  to  the  aircraft’s  call  sign  over  a 
period  of  time.  Reaction  time  to  initiate  navigation  and  frequency  changes  and 
errors  in  making  these  responses  can  also  be  obtained. 

Hardware  Requirements 

The  revised  MAT  battery  runs  efficiently  on  any  AT-class  PC 
equipped  with  an  EGA  video  card,  although  it  is  preferable  to  use  a  386-  or 
486*class  computer.  A  Heath  voice  card  and  a  8088-class  PC  are  also  needed 
for  the  ATC  communications  task.  Accessories  needed  include  a  joystick  and 
I/O  card  for  the  tracking  task  and  a  pair  of  earphones  for  the  communications 
task. 


PILOT  STUDY 

The  pilot  study  was  carried  out  to  "calibrate"  the  first-order  tracking 
task  so  that  an  appropriate  level  of  difficulty  could  be  established  for  use  in 
subsequent  studies  of  multiple-task  performance  and  in  the  adaptive- 
automation  studies.  Pilot  data  was  already  available  on  the  performance 
characteristics  of  all  four  tasks  of  the  MAT  (Comstock  and  Pope,  personal 
communication).  However,  performance  data  for  the  tracking  task  had  been 
gathered  using  a  mouse  as  a  control  device,  which  we  felt  was  not  the  most 
appropriate  control  input  for  a  tracking  task  meant  to  simulate  flight  operations. 
We  therefore  performed  a  pilot  study  examining  the  performance  characteristics 
of  the  tracking  task  using  both  a  mouse  and  a  joystick  as  control  devices. 

Subjects 


Twelve  volunteers  drawn  from  the  staff  of  the  Cognitive  Science 
Laboratory  and  the  Department  of  Psychology,  six  males  and  six  females, 
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participated  in  a  single  session  lasting  about  45  minutes.  They  ranged  in  age 
from  22  to  35  years,  were  right-handed,  and  had  normal  (20/20)  or  corrected- 
to-normal  vision.  None  of  the  subjects  had  prior  experience  with  the  MAT  tasks. 

Procedure 


The  revised  MAT  was  used  with  the  tracking  window  active  under 
the  manual  mode.  The  12  subjects  were  allocated  randomly  to  two  equal 
groups  of  six  subjects  each  with  the  restriction  that  the  groups  be  matched  for 
gender.  One  group  used  the  mouse  as  a  control  device  while  the  other  group 
used  the  joystick.  Each  subject  was  tested  in  two  phases  consisting  of  several 
blocks  of  5  min  each.  In  the  first  phase,  following  instruction  and  training,  the 
tracking  task  was  performed  at  each  of  three  levels  of  difficulty,  defined  in  terms 
of  the  highest  (cutoff)  frequency  of  the  forcing  function,  as  follows:  .016  Hz 
(easy),  .064  Hz  (moderate),  and  .112  Hz  (difficult).  Within  each  group  (mouse 
or  joystick),  three  of  the  subjects  tracked  in  the  order  easy-moderate-difficult, 
while  three  tracked  in  the  reverse  order.  After  a  short  rest  break,  subjects 
performed  three  successive  5-min  blocks  of  the  tracking  task  (with  short  rest 
breaks)  at  the  moderate  level  of  difficulty,  in  order  to  assess  the  effects  of 
modest  levels  of  practice  on  tracking  performance  using  a  mouse  or  a  joystick. 

Results 


Preliminary  analysis  of  the  data  revealed  no  effect  due  to  subject 
gender.  The  results  of  all  subsequent  analyses  are  for  data  collapsed  across 
gender. 


The  mean  root  mean  square  (RMS)  error  in  the  x-  and  y-  directions 
was  computed  (in  adjusted  pixel  units;  see  previous  section  on  revised  MAT 
battery  for  the  formula  used  for  computing  RMS  error)  for  each  5-min  block  of 
the  tracking  task.  These  data  were  submitted  to  an  analysis  of  variance 
(ANOVA)  with  control  cL  ice  (mouse/joystick)  and  testing  order  as  between- 
subjects  factors  and  forcing  function  frequency  (difficulty  level)  as  a  within- 
subjects  factor.  A  second  ANOVA  of  RMS  error  scores  was  also  carried  out  for 
the  moderate-difficulty  condition,  with  control  device  as  a  between-subjects 
factor  and  blocks  as  a  within-subjects  factor. 
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FORCING  FUNCTION  FREQUENCY  (Hz) 


’Figure  2.  Effects  of  forcing  function  frequency 
(tracking  difficulty)  on  mean  RMS  tracking  error. 


Figure  2  shows  the  mean  root  mean  square  (RMS)  error  in  the 
tracking  task  for  each  control  input  as  a  function  of  task  difficulty  (forcing  function 
frequency).  RMS  error  increased  with  forcing  function  frequency,  £(2,16)  = 
6.35,  c  <  .01,  confirming  that  the  effect  of  task  difficulty  on  performance  was 
reliable.  The  main  effect  of  control  device  was  not  significant,  indicating  that 
performance  was  equivalent  for  the  mouse  and  joystick.  The  effects  of  testing 
order  and  all  interactions  were  not  significant. 

Figure  3  shows  mean  RMS  error  scores  for  each  control  device  for 
the  moderate  difficulty  tracking  condition  as  a  function  of  blocks  of  practice. 
ANOVA  of  these  data  gave  no  significant  effects.  We  anticipated  that  RMS  error 
would  decline  with  practice  at  tracking.  Somewhat  surprisingly,  however,  RMS 
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error  did  not  change  significantly  with  blocks  of  practice.  This  could  have 
occurred  because  subjects  may  have  reached  asymptotic  levels  of  performance 
in  the  earlier  phase  of  the  experiment  in  which  they  performed  the  tracking  task 
at  all  three  levels  of  task  difficulty. 


BLOCKS 


Figure  3.  Effects  of  practice  on  tracking  performance. 


Discussion 


The  present  results  confirmed  the  sensitivity  of  the  tracking  task  to 
changes  in  task  difficulty  as  manipulated  by  variations  in  the  forcing  function 
frequency.  Furthermore,  tracking  performance  was  relatively  insensitive  to 
practice  effects,  although  as  discussed  earlier  this  might  have  occurred 
because  subjects  reached  their  personal  performance  ceilings  in  prior  practice. 
Whatever  the  cause,  the  results  indicate  that  for  the  present  tracking  task,  only 
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a  modest  amount  of  practice  is  needed  for  performance  with  either  the  mouse 
or  joystick  to  reach  relatively  stable  levels. 

On  the  basis  of  these  results,  we  chose  the  .064  Hz  forcing 
function  frequency,  representing  the  moderate  difficulty  condition,  as  the 
baseline  tracking  difficulty  level  for  subsequent  studies.  By  choosing  this 
frequency  one  could  be  confident  that  either  a  decrease  or  an  increase  in 
forcing  function  frequency  (as  might  occur  under  conditions  in  which  adaptive 
automation  is  invoked)  would  result  in  appropriate  changes  in  performance 
levels.  The  results  also  suggest  that  extensive  practice  is  not  required  to  reach 
stable  performance  levels  on  the  tracking  task.  Finally,  satisfactory  results  were 
obtained  with  either  the  mouse  or  the  joystick  control.  However,  the  joystick  was 
chosen  over  the  mouse  for  subsequent  studies  because  of  its  closer  relation  to 
cockpit  control  devices  and  because  subjects  reported  f.nding  the  joystick 
easier  to  use. 


MULTI-TASK  PERFORMANCE  CHARACTERISTICS 

The  pilot  study  established  an  appropriate  task  difficulty  level  for 
the  tracking  task.  As  mentioned  earlier,  task  parameters  required  to  attain 
particular  performance  levels  for  the  monitoring,  fuel  management,  and  ATC 
communications  tasks  were  known.  However,  for  each  of  these  tasks, 
performance  levels  were  obtained  for  single-task  conditions;  efficiency  levels  for 
dual-task  and  multi-task  performance  were  unknown.  The  present  study  was 
designed  to  obtain  performance  data  in  these  conditions.  Only  three  tasks  were 
used,  tracking,  monitoring,  and  fuel  management.  The  hardware  required  to 
run  the  communications  task  was  not  available  at  the  time  of  this  study. 
However,  this  task  is  now  in  operation  and  can  be  used  in  future  studies. 

Subjects 


Eight  students  from  The  Catholic  University  of  America,  4  males 
and  4  females,  participated  to  fulfill  a  course  requirement.  Each  subject  was 
tested  in  a  single  2-hour  session.  Subjects  ranged  in  age  between  18  to  25 
years.  All  subjects  were  right-handed  and  had  normal  (20/20)  or  corrected-to- 
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normal  vision.  To  avoid  any  previous  learning  effects,  all  participants  had 
never  been  subjects  in  similar  experiments  before. 

Procedure 


The  revised  MAT  was  used  with  the  tracking,  monitoring,  and  fuel 
management  windows  active  in  the  manual  mode.  Each  subject  performed  in 
each  of  seven  task  combination  conditions:  the  three  tasks  (T  =  tracking:  M  = 
monitoring:  F  =  fuel  management)  alone,  the  three  combinations  of  pairs  of 
tasks  (TM,  TF,  and  MF),  and  the  multi-task  condition  (TMF)  4.  Half  the  subjects 
performed  the  tasks  in  an  order  progressing  from  single  through  dual  to  multi¬ 
tasks:  T-M-F-TM-TF-MF-TMF;  while  the  other  half  did  the  tasks  in  the  reverse 
order:  TMF-MF-TF-TM-F-M-T. 

Following  instructions  and  framing  each  subject  performed  for 
seven  10  min-blocks,  one  for  each  of  the  seven  task  combination  conditions. 
Subjects  were  shown  their  results  and  were  given  feedback  regarding  their 
performance  at  the  end  of  t  .ch  block.  In  the  dual-task  and  multi-task 
conditions,  subje  :s  were  instructed  to  given  equal  priority  to  each  task. 

Results 


Preliminary  analysis  of  the  data  revealed  no  effects  due  to 
operator  gender  and  hence  the  data  were  collapsed  across  this  subject  variable 
in  all  subsequent  analyses.  Figure  4  shows  mean  RMS  error  for  the  tracking 
task  as  a  function  of  task  combinations.  Tracking  was  relatively  efficient  when 
carried  out  in  isolation  but  became  poorer  with  the  introduction  of  the  other 
tasks.  The  RMS  data  were  submitted  to  an  ANOVA  with  order  of  testing  as  a 
between-groups  factor  and  task  combinations  as  a  within-groups  factor.  ANOVA 
showed  that  RMS  error  varied  significantly  with  task  combinations,  £(3,18)  = 
7.01,  p.  <  .01,  but  not  with  order  or  with  the  interaction  of  order  and  task 
combination.  Figure  4  indicates  that  tracking  error  increased  markedly  in  the 

4  In  all  conditions,  only  the  relevant  task  windows  were  displayed.  For  example,  in  the 
dual-task  tracking  and  monitoring  condition,  only  these  two  windows  were  active:  the 
fuel  management  (as  well  as  pump  status)  windows  were  empty. 
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TM  TF _ IME 

T  133.6*  205.3*  205.6* 

TM  71.7  72.0 

TF  0.3 


*  q_  <  .05. 

Table  1.  Differences  in  tracking  RMS  error  between  different  task  combinations 
(T  =  tracking;  M  =  monitoring;  F  =  fuel  management). 


Figure  5.  Accuracy  of  monitoring  performance  as  a  function  of  single-,  dual-, 
and  multi-task  conditions.  (M  =  monitoring;  T  =  tracking;  F  =  fuel  management). 
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Monitoring  accuracy  (rate  of  correct  identification  of  engine  malfunctions)  under 
single  and  multi-task  conditions  is  shown  in  Figure  5.  When  performed  alone, 
performance  accuracy  was  very  close  to  100%,  Monitoring  accuracy  was 
reduced  when  performed  with  the  tracking  and  fuel  management  tasks,  and 
further  reduced  when  all  three  tasks  were  performed.  But  although  there  was  a 
trend  towards  performance  reduction  with  task  combination,  the  effect  of 
conditions  was  not  significant,  E(3,18)  <  1.  The  ceiling  levels  of  single-  and 
dual-task  performance  (over  93%)  preclude  statistical  analyses  of  these  data. 


Figure  6.  Mean  monitoring  task  reaction  time  as  a  function  of  single-,  dual-,  and 
multi-task  conditions.  (M  =  monitoring;  T  =  tracking;  F  =  fuel  management). 


Mean  reaction  time  in  the  monitoring  task  is  displayed  in  Figure  6. 
An  almost  linear  increase  in  RT  occurred  with  task  combinations,  an  increase 
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that  was  significant  by  ANOVA,  £(3,1 8)  =  10.01,  £  <  .001.  The  effects  of  testing 
order  and  the  order  x  task  combination  interaction  were  not  significant.  The 
significance  of  differences  between  ordered  means  for  the  different  task 
combinations  were  carried  out  using  the  Newman-Keuls  test,  as  was  done  for 
the  tracking  task.  Table  2  gives  the  mean  differences  in  reaction  time  between 
all  possible  task  combinations.  Again,  as  for  the  monitoring  task  analysis,  five 
contrasts  were  expected  to  be  reliable.  Table  2  shows  that  four  of  these 
contrasts  were  significant.  The  fifth,  comparing  the  monitoring/fuel  management 
condition  with  the  multi-task  condition,  was  of  borderline  significance. 

TM  MF  TMF 

M  1.47*  1.75*  1.67* 

TM  0.28  1.69* 

MF  1.41/ 

*£.<.05. 

/fi  <  .07. 

Table  2.  Differences  in  monitoring  task  reaction  time  (in  sec)  between  different 
task  combinations  (T  =  tracking;  M  =  monitoring;  F  =  fuel  management). 


Taken  together  with  the  results  for  monitoring  accuracy,  these 
results  indicate  that  monitoring  performance  was  sensitive  to  task  combination, 
but  that  speed  of  monitoring  declined  more  markedly  than  accuracy  as  the 
tracking  and  fuel  management  tasks  were  added  to  the  monitoring  task. 

Performance  on  the  fuel  management  task  is  shown  in  Figure  7. 
This  figure  indicates  an  increase  in  RMS  error  in  setting  fuel  levels  with  task 
combination.  Although  mean  RMS  appears  to  increase  markedly  with  task 
combination,  there  was  very  high  inter-subject  variability  for  this  measure  of  fuel 
management  performance,  and  the  effect  of  conditions  was  not  significant, 
£(3,18)  <  1,  but  the  order  by  conditions  interaction  was  significant,  £(3,18)  « 
4.32,  b  =  .05.  The  interaction  came  about  because  the  subjects  who  performed 
the  single-task  condition  first  had  extremely  high  RMS  error  scores,  which  led 
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to  a  weakening  of  the  effect  of  task  combination,  whereas  subjects  performing 
the  tasks  in  the  reverse  order  showed  the  expected  increase  in  RMS  error  as  a 
function  of  task  combination.  The  unusually  high  initial  single-task  RMS  error, 
which  may  have  resulted  from  insufficient  practice  at  the  task  for  these  subjects, 
coupled  with  the  very  high  inter-subject  variability  in  this  measure,  contributed 
to  the  lack  of  significance  of  the  main  effect  for  task  combination  conditions. 


Figure  7.  Fuel  management  performance  as  a  function  of  single-,  dual-,  and 
multi-task  conditions.  (F  =  fuel  management;  T  =  tracking;  M  =  monitoring  ). 
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Discussion 


The  results  of  the  present  study  point  to  a  fairly  regular  pattern  of 
performance  changes  on  each  task  in  response  to  concurrent  task  demand. 
Both  the  tracking  and  monitoring  tasks  showed  the  anticipated  performance 
decrements  from  single-  to  multi-task  performance  (Wickens,  1987).  These 
performance  decrements  were  in  the  expected  direction  and  were  systematic: 
for  the  tracking  task,  three  out  of  the  five  possible  contrasts  between  different 
task  pairings  (i.e.,  single-dual  and  dual-multi)  gave  reliable  evidence  of 
performance  decrement,  whereas  for  the  monitoring  task  (reaction  time 
measure),  four  out  the  five  comparisons  were  statistically  significant.  Given  the 
relatively  small  sample  size  of  this  study,  these  results  are  encouraging,  and 
indicate  that  both  the  tracking  and  monitoring  tasks  of  the  revised  MAT  battery 
are  sufficiently  sensitive  to  variations  in  task  load  and  should  therefore  be 
appropriate  for  our  future  studies  of  adaptive  automation  in  which  task  difficulty 
and  task  load  will  vary  dynamically. 

While  both  the  tracking  and  monitoring  tasks  were  highly  sensitive 
to  concurrent  task  load,  there  was  an  interesting  dissociation  between  the 
tasks  at  the  highest  level  of  load.  Tracking  performance  decreased  significantly 
from  single-task  performance  to  both  dual-task  ioadings  (tracking/monitoring 
and  tracking/fuel  management).  However,  the  performance  decrement  from 
dual-task  to  multi-task  loadings  was  much  reduced,  and  in  fact  was  significant 
for  only  one  of  the  two  such  contrasts  (see  Figure  4).  On  the  other  hand,  the 
monitoring  task  showed  consistent  decreases  in  performance  from  single¬ 
task  to  dual-task  to  multi-task  loadings.  This  might  have  resulted  from  the 
operators  using  a  strategy  of  "protecting"  performance  on  the  tracking  task 
under  the  highest  levels  of  load  (e.g.,  Wickens,  1987).  Although  operators  were 
told  that  the  tracking  and  monitoring  tasks  had  equal  priority,  they  may  have 
perceived  the  tracking  task  as  being  more  important  and  allocated  additional 
resources  to  perform  this  task  when  all  three  tasks  had  to  be  performed 
concurrently.  In  contrast,  the  monitoring  and  fuel-management  tasks  may  have 
been  perceived  as  of  secondary  importance;  and  in  fact  performance  on  these 
tasks  did  decrease  in  the  multi-task  condition  (although  significantly  so  only  for 
the  monitoring  task). 
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While  tracking  and  monitoring  performance  changed  consistently 
with  concurrent  task  demand,  performance  on  the  fuel-management  task  was 
less  consistent.  Inspection  of  the  mean  performance  scores  showed  some 
indication  of  decrement  under  concurrent  load,  i.e.,  from  single-task  to  dual¬ 
task  and  from  dual-task  to  multi-task  combinations,  but  the  particular 
performance  measure  that  we  chose,  fuel-management  RMS  error,  was  highly 
variable,  and  was  not  affected  significantly  by  task  load.  Nevertheless,  the  fuel- 
management  task  did  have  an  impact  on  multi-task  performance,  as  evidenced 
by  performance  decrements  on  the  monitoring  and  tracking  tasks  when  it  was 
paired  singly  or  jointly  with  these  tasks.  Thus,  the  task  did  clearly  contribute  to 
the  overall  processing  demand  imposed  on  the  operator.  Automation  of  this 
task  should  therefore  have  an  impact  on  operator  performance  of  other  tasks  in 
an  adaptive  automation  environment.  From  this  (admittedly  limited) 
perspective,  we  concluded  that  the  fuel-management  task  would  be  useful  in 
our  future  adaptive  automation  studies,  although  the  lack  of  sensitivity  of  the 
task  itself  is  problematic  and  will  require  additional  work  to  resolve. 

Why  was  the  fuel-management  task  not  sensitive  to  task  load? 
There  are  several  possibilities.  First,  the  measure  we  chose  may  not  have  been 
the  best  one.  As  mentioned  earlier,  there  was  very  high  inter-subject  variability 
in  this  performance  measure.  In  Wickens'  (1984)  terms,  this  measure  was  not 
sensitive  to  operator  workload  experienced  in  performing  this  task.  We  are 
currently  exploring  other  ways  of  characterizing  performance  on  this  task. 
Second,  the  fuel-management  task  may  have  been  more  sensitive  to  practice 
effects  than  the  other  two  tasks.  (In  our  pilot  study  we  investigated  practice 
effects  only  for  the  tracking  task).  At  least  for  the  RMS  error  measure,  there  was 
evidence  of  practice  effects  lasting  well  into  the  experimental  session.  These 
practice  effects  may  have  masked  the  effects  of  task  loading.  Third,  of  all  the 
tasks  in  the  MAT,  the  fuel  management  task  is  the  one  that  allows  the  operator 
the  greatest  flexibilty  in  the  way  the  task  is  performed.  This  implies  that  subjects 
probably  used  a  variety  of  different  strategies  to  perform  the  task.  This  in  turn 
could  have  contributed  to  the  variability  in  the  RMS  error  measure. 
Unfortunately,  we  do  not  currently  have  a  way  of  assessing  what  strategies 
were  used.  (Informal  questioning  of  the  subjects  did  not  provide  any  reliable 
information  as  to  strategies  used.)  This  is  clearly  a  point  for  future  research  to 
pursue,  particularly  in  the  context  of  adaptive  automation.  Automation  may 
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change  the  way  that  operators  perform  tasks,  particularly  if  they  are  switched 
rapidly  from  manual  to  automated  modes  and  back,  as  is  possible  in  an 
adaptive  automation  environment  (Parasuraman  et  al.  ,  1990).  Such  strategy 
changes  need  to  be  better  understood  in  order  to  evaluate  the  impact  of 
adaptive  automation  on  operator  performance. 

Finally,  from  a  purely  theoretical  perspective,  the  present  results 
provide  no  information  on  the  source  of  task  interference  in  multi-task 
performance  (e.g.,  Gopher,  1986;  Kahneman,  1973;  Navon,  1984;  Wickens, 
1987).  The  performance  profiles  obtained  in  the  present  are  consistent  with 
operator  limitations  in  perceptual/cognitive  processing  resources  or  in 
structural  factors  (Kahneman,  1973).  The  latter  refers  to  interference  at  the 
input  stage,  for  example  because  of  the  inability  of  the  operator  to  fixate  two 
display  locations  at  the  same  time,  or  to  output  interference,  for  example 
because  the  same  motor  pathway  has  to  be  used  to  execute  responses  to  two 
tasks.  (See  also  Navon,  1984,  for  additional  descriptions  of  input  and  output 
sources  of  interference). 

It  can  be  argued  that  resource  scarcity  rather  than  structural 
interference  was  the  major  source  of  performance  decrement  in  the  present 
study.  At  the  input  end,  all  display  windows  were  capable  of  being  processed 
without  the  need  for  peripheral  vision.  If  the  operator  fixated  the  center  of  the 
tracking  window,  for  example,  then  the  monitoring  and  fuel  management 
windows  were  within  6°  of  visual  angle.  Subjects  clearly  did  make  eye 
movements  to  different  task  windows;  and  there  is  evidence  that  information  is 
processed  less  efficiently  at  non-attended  locations  than  at  attended  locations 
(Posner,  1980).  However,  there  was  no  consistent  evidence  that  subjects 
fixated  or  visually  attended  to  one  display  window  to  the  exclusion  of  others 
(with  the  possible  exception  of  the  tracking  task  in  the  multi-task  condition, 
where,  as  mentioned  perviously,  subjects  may  have  attended  more  to  the 
tracking  window  in  order  to  maintain  performance  under  increased  load). 

With  respect  to  the  output  stage  of  information  processing,  the 
input  controls  for  the  different  tasks  were  clearly  defined  and  separated,  and 
were  consistent  with  high  stimulus-response  compatibility,  all  of  which  should 
reduce  the  likelihood  of  output  interference  (Navon,  1984).  The  tracking  task 
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was  performed  with  the  right  hand,  and  the  monitoring  and  fuel  management 
tasks  with  the  left  hand.  While  the  same  motor  pathway  was  used  for  the  latter 
two  tasks,  operators  were  rarely  required  to  execute  responses  to  the  two  tasks 
simultaneously.  On  the  rare  occasions  when  both  tasks  required  action  at  about 
the  same  time,  output  interference  was  again  likely  to  be  low  because  the  fuel- 
management  task  was  not  a  reaction-time  task,  and  it  could  be  responded  to 
following  the  monitoring  response  without  a  significant  impact  on  performance. 

Whatever  the  precise  theoretical  reasons  for  the  pattern  of 
performance  decrement  obtained  (and  the  present  study  was  not  designed  to 
distinguish  between  these  alternatives),  the  results  show  that  the  revised  MAT 
tasks  were  sufficiently  diagnostic  of  concurrent  task  demand  on  the  operator. 
Taken  together  with  the  data  from  the  pilot  study,  the  results  established  the 
sensitivity  of  the  tracking,  monitoring,  and  fuel  management  tasks  of  the  revised 
MAT  battery  to  variations  in  task  difficulty  and  task  load.  As  such,  the  present 
study  met  its  objective  of  providing  a  baseline  for  further  studies  of  adpative 
automation  in  which  task  difficulty  and  task  load  will  be  varied  dynamically. 

CONCLUSIONS 

The  three  studies  conducted  as  part  of  our  initial  investigation  on 
the  effects  of  adaptive  automation  were  successful  in  meeting  most  of  the  start¬ 
up  goals  of  our  research  program.  The  first  study  resulted  in  test  software-the 
revised  Multi-Attribute  Task  battery-that  will  provide  the  platform  for  examining 
performance  effects  of  adaptive  automation.  These  results  of  the  second  and 
third  studies  established  a  baseline  of  empirical  performance  data  in  a  multi¬ 
task  flight-simulation  environment.  These  data  will  help  in  the  design  and 
interpretation  of  results  of  future  adaptive-automation  studies  that  will  be  carried 
out  as  part  of  this  research  program. 
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